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1 .  Almost  without  exception,  the  interesting  things  we  know  might  not  be  so.  They 
might  not  be  so  in  the  very  strong  sense  that  the  statements  expressing  them  might,  to  our 
shock  and  surprise,  turn  out  to  be  false.  The  exceptions  are  statements,  like  mathematical 
theorems,  that  are  interesting  precisely  because  they  cannot  be  false. 

This  would  suggest  that  a  fundamental  concern  of  knowledge  representation  should 
be  the  treatment  of  uncertainty.  There  are  a  number  of  approaches  to  uncertainty  that  might 
be  considered:  There  is  the  purely  Bayesian  approach,  in  which  one  assigns  probabilities 
[Cheeseman,1985],  [Pearl,  1988];  there  are  various  alternative  numerical  measures  that 
have  been  proposed  [Shafer,  1976, 1987],  [Zaueh,  1975],  [Shortliffe,  1976];  Higher  order 
probabilities  have  been  suggested  [Domotor,  1980],  [Skyrms,  1980];  there  is  a  wide 
variety  of  non-monotonic  formalisms  that  might  be  used  to  capture  the  uncertainty  of 
inference,  if  not  the  uncertainty  of  knowledge  [McCarthy,  1980,  1987],  [Reiter,  1980], 
[McDermott,  1980],  etc. 

The  relations  among  these  approaches  have  been  discussed  in  a  number  of  places 
[Kyburg,  1987,  1988a,  1988b,  1988c].  We  do  not  propose  to  discuss  these  relations 
further  here,  but  simply  to  adopt  an  interval-valued  epistemic  notion  of  probability  (which 
we  shall  briefly  characterize  in  the  next  section)  and  to  show  how  this  approach  can  be  used 
for  inference,  decision-making,  evidential  and  inductive  reasoning,  and  commonsense 
reasoning,  as  well  as  nonmonotonic  reasoning. 
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2 .  Since  the  membership  of  statements  in  these  bodies  of  knowledge  depends  on 
probability,  we  had  best  begin  with  a  brief  characterization  of  the  sense  of  probability  we 
are  employing.  We  construe  probability  as  objective,  and  not  subjective.  But  we 
specifically  think  of  probability  as  epistemic:  that  is,  it  concerns  individual  cases,  and  not 
merely  classes  of  cases. 

Probabilities  are  assigned  to  statements,  relative  to  a  body  of  evidence  -  what  I 
have  called  the  evidential  corpus.  We  require  statistical  knowledge  (not  just  statistical 
evidence)  as  a  basis  for  every  probability  statement.  Two  facts  render  this  constraint 
acceptable:  It  doesn't  take  much  statistical  data  to  yield  an  approximate  statistical 
hypothesis.  And  if  we  adopt  the  principle  that  statements  known  to  have  the  same  truth 
value  are  to  be  assigned  the  same  probability,  we  may  link  many  statements  to  the  same 
statistical  foundation. 

Many  people  lament  the  fact  that  we  do  not  have  the  statistical  knowledge  to  use 
probabilities  (McCarthy  and  Hayes,  1969).  In  fact  the  opposite  is  the  case.  Once  you 
admit  the  linkage  among  statements  known  to  have  the  same  truth  value,  and  once  you 
admit  approximate  probabilities,  the  difficulty  is  to  choose  the  appropriate  reference  class 
among  a  possibly  large  number  of  potential  candidates. 

Two  principles  suffice  to  perform  this  selection.  They  include  as  a  special  case  the 
various  principles  of  maximum  specificity  that  have  been  proposed  both  in  non-monotonic 
logic  and  in  the  philosophy  of  scientific  explanation  [Etherington,  1987],  [Horty,  1987], 
[Poole,  1985].  The  principles  include  two  other  cases  that  have  not  been  noted  in  the  A1 
literature. 

We  assume,  as  usual,  a  formal  language,  and  a  fixed  body  of  knowledge.  A 
sentence  S  of  our  language  determines  a  class  of  inference  structures.  An  inference 
structure  is  a  5-tuple  of  the  form  <ind,  prop,  ref. class,  low,  high>,  where  in  the  body  of 
knowledge  we  know  "5  <->  ind  has  prop,"  we  know  "ind  is  in  ref. class,”  and  the  most 


accurate  statistical  knowledge  we  have  about  the  frequency  of  the  property  in  the  reference 
class  is  that  it  lies  between  low  and  high. 

Two  inference  structures  differ,  if  neither  mentioned  interval  is  included  in  the 

other. 

Principle  I:  If  two  inference  structures  ISI  and  IS2  differ  from  each  other,  delete 
both  from  the  original  set,  unless 

(a)  One  ref.class  is  known  to  be  included  in  the 

other,  or 

(b)  [A  dual  condition  concerning  sampling]  or 

(c)  [A  condition  concerning  sequential  experiments  --  the  classical 
"Bayesian"  case] 

These  last  two  conditions  are  slighdy  complicated  to  state,  but  versions  have  been 
offered  in  [Kyburg,  1961,  1974,  and  1983],  The  output  of  the  application  of  principle  I  is 
a  reduced  class  of  inference  structures,  no  two  of  which  differ.  We  then  apply  principle  II. 

Principle  II:  If  the  interval  mentioned  by  one  inference  structure  is  properly 
included  in  the  interval  mentioned  by  a  second  inference  structure,  delete  the  second. 

The  outcome  of  the  application  of  these  two  principles  is  a  class  of  inference 
structures  that  agree  precisely.  The  common  interval  mentioned  by  these  inference 
structures  is  the  probability  of  5,  and  also,  in  virtue  of  the  use  we  have  made  of  the 
biconditional,  of  any  statement  we  know  to  have  the  same  truth  value  as  S.  This  procedure 
is  deterministic,  and  in  fact  has  been  implemented  in  limited  domains  [Loui,  1986]. 

3.  A  knowledge  state  is  represented  by  two  sets  of  statements,  rather  than  one.  One 
set  of  statements  represents  evidence;  it  corresponds  to  recorded  data,  together  with  general 
knowledge  that  is  not  open  to  question  in  the  context  at  hand.  We  refer  to  this  as  the 
evidential  corpus  of  knowledge.  We  will  say  more  about  it  shortly. 


The  other  set  of  statements  represents  a  body  of  practical  certainties,  based  on  the 
statements  constituting  the  evidential  corpus.  It  consists  of  statements  whose  probabilities, 
relative  to  the  evidential  corpus,  exceed  some  explicit  level  determined  by  the  context. 

(This  is  to  be  contrasted  with  the  idea,  to  be  found  in  [Pearl,  1988],  for  example,  that 
probabilistic  acceptance  requires  an  arbitrarily  high  probability.)  This  set  of  statements  we 
will  call  the  practical  corpus. 

A  statement  is  in  the  practical  corpus  just  in  case  its  probability  exceeds  a 
level  we  take  to  correspond  to  "practical  certainty"  in  a  given  context.  (For  a  suggestion  as 
to  how  that  level  might  be  determined,  see  [Kyburg,  1988d].)  This  has  the  important  and 
useful  consequence  that  the  practical  corpus  is  not  deductively  closed,  since  in  general  the 
probability  of  a  conjunction,  even  in  the  epistemic  sense,  is  less  than  (has  a  lower  bound 
less  than)  the  probability  of  either  of  its  conjuncts.  We  do  have  limited  closure: 

If  S  is  in  the  practical  corpus,  and  T  is  deductively  implied  by  S,  then  T  will  also 

be  in  it 

A  further  consequence  that  is  of  considerable  significance  is  that  the  practical 
corpus,  since  it  is  not  deductively  closed,  may  be  "inconsistent."  We  draw  the  fangs  of  the 
lottery  paradox  [Kyburg,  1961],  by  refusing  to  countenance  deductive  or  conjunctive 
closure.  This  not  only  allow  us  to  have  "ticket  i  will  not  win"  in  our  corpus  (for  large 
lotteries)  but,  more  important,  allows  us  to  have  statements  of  the  form,  "the  error  of 
measurement  /  is  less  than  d  "  in  our  corpus,  even  when  there  are  so  many  that  we  can  be 
practically  certain  that  at  least  one  of  those  measurements  is  in  error  by  more  than  d. 

If  statements  get  in  the  practical  corpus  by  being  probable  enough  relative  to  the 
evidential  corpus,  how  do  they  get  in  the  evidential  corpus?  Presumably  the  evidential 
corpus  is  even  more  demanding  than  the  practical  corpus.  And  is  the  evidential  corpus 
deductively  closed?  In  a  given  context,  we  take  the  contents  of  the  evidential  corpus  for 
granted:  to  ask  the  provenance  of  statements  in  the  evidential  corpus  is  to  shift  context  -  to 
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regard  it  as  "practical"  relative  to  a  new  "evidential"  corpus.  This  suggests  that  we  take  the 
structure  of  the  evidential  corpus  to  be  the  same  as  that  of  the  practical  corpus. 

4.  Probabilities  are  defined  relative  to  the  practical  corpus  in  the  same  way  that  they 
can  be  defined  relative  to  the  evidential  corpus.  This  yields  a  natural  decision  theory.  (It  is 
weak,  due  to  the  fact  that  probabilities  are  intervals.) 

It  is  clear  that  as  evidence  is  added  to  the  evidential  corpus,  statements  will  come 
and  go  in  the  practical  corpus,  reflecting  the  nonmonotonicity  of  ordinary  reasoning.  (The 
practical  corpus  will  be  incomplete.)  The  conventional  examples  are  easy  to  handle. 

In  planning,  we  do  not  in  general  want  to  have  to  consider  outlandish  possibilities  - 
-  the  potato  in  the  tailpipe.  Outlandish  possibilities  are  not  represented  in  the  practical 
corpus:  they  do  not  represent  possibilities  that  we  should  take  seriously.  But  they  can  be 
represented  as  possibilities  in  the  evidential  corpus,  and  an  addition  to  that  corpus  can 
change  their  probabilities,  and  thus  lead  to  their  significant  probability  relative  to  the 
practical  corpus. 

In  some  planning  situations,  we  wish  to  take  advantage  of  external  inputs  to  modify 
our  plans.  In  general,  this  will  be  helpful  only  if  we  can  deal  quantitatively  with  the 
possibility  of  error  in  the  input.  The  suggested  approach  allows  this:  the  evidential  corpus 
can  contain  general  error  distributions,  from  which  we  can  infer  in  the  practical  corpus 
statements  about  errors  in  particular  cases. 

There  are  many  cases  in  which  we  want  our  system  to  take  as  fact,  ceteris  paribus, 
a  certain  statement;  and  at  the  same  time,  be  sensitive  to  the  fact  that  circumstances  can  arise 
when  ceteris  is  no  longer  paribus. 

The  cost  of  being  able  to  do  this  is  that  any  addition  to  the  evidential  corpus  may 
make  a  crucial  difference  to  what  is  contained  in  the  practical  corpus.  But  that  difference 
can  only  make  itself  felt  in  a  change  of  probabilities,  relative  to  the  evidential  corpus,  of 


statements  that  are  relevant  to  the  decision  or  goal  we  are  concerned  with.  We  may  think  of 
this  as  vertical  modularity. 

We  must  also  consider  the  possibilities  of  horizontal  modularity:  there  are  some 
domains  that  are  quite  independent  of  other  domains,  ordinarily,  and  we  should  be  able  to 
take  advantage  of  those  independencies.  But  we  would  want  to  allow  the  boundaries  of 
these  domains  to  shift  as  our  evidential  corpus  changes:  it  is  always  possible  that  there  is  a 
link,  after  all,  between  the  number  of  missionaries  in  Papua  and  the  rainfall  in  South  Bend, 
and  that  we  could  discover  it  and  incorporate  it  in  our  evidential  corpus. 

1.  Research  on  which  this  work  was  based  was  partially  supported  by  the  U.  S.  Army 
Signals  Warfare  Center. 


Cheeseman,  Peter  (1985):  "In  Defense  of  Probability,"  UCAI 85,  Morgan  Kaufmann, 
Los  Altos,  1002-1009. 

Domotor,  Zoltan:  "Higher  Order  Probabilities,"  Philosophical  Studies  40,  1980,  pp  31- 
46. 

Etherington,  D.  W.:  "Formalizing  Non-Monotonic  Reasoning  Systems,"  Artificial 
Intelligence  31,  1987, 41-86. 

Horty,  John,  Thomason,  Richmond,  and  Touretzky,  David:  "A  Skeptical  Theory  of 
Inheritance  in  Non-monotonic  Semantic  Networks,"  AAAI-87,  Morgan 
Kaufman,  Los  Altos,  1987,358-363. 

Kyburg,  Henry  E.  Jr.:  Theory  and  Measurement  Cambridge  University  Press, 
Cambridge  1984 

Kyburg,  Henry  E.,  Jr.:  "Full  Belief."  Theory  and  Decision  25,  1988d,  137-162. 

Kyburg,  Henry  E.,  Jr.:  "Higher  Order  Probabilities  and  Intervals,"  International 
Journal  of  Approximate  Reasoning  2,  1988c,  pp  195-209. 

Kyburg,  Henry  E.,  Jr.:  "Probabilistic  Inference  and  Non- Monotonic  Inference," 

Shachter,  Ross,  and  Levitt,  Todd  (eds):  The  Fourth  Workshop  on  Uncertainty 
in  Artificial  Intelligence,  1988a,  pp  229-236. 

Kyburg,  Henry  E.,  Jr.:  "Probabilistic  Inference  and  Probabilistic  Reasoning," 

Shachter,  Ross,  and  Levitt,  Todd  (eds):  The  Fourth  Workshop  on  Uncertainty 
in  Artificial  Intelligence,  1988b.  pp.  221-228. 

Kyburg,  Henry  E.,  Jr.:  The  Logical  Foundations  of  Statistical  Inference,  Reidel,  1974. 

Kyburg,  Henry  E.,  Jr.:  "The  Reference  Class,"  Philosophy  of  Science  50,  1983,  pp 
374-397. 

Kyburg,  Henry  E.,  Jr.:(1961)  Probability  and  the  Logic  of  Rational  Belief,  Wesleyan 
University  Press,  1961. 

Kyburg,  Henry  E.,  Jr.:"Bayesian  and  Non-Bayesian  Evidential  Updating;"  A  I  Journal 
31,  1987,  pp  271-294. 

Loui,  Ronald  P.:  "Computing  Reference  Classes,"  Proceedings  of  the  1986  Workshop 
on  Uncertainty  in  Artificial  Intelligence,  (1986)  183-188. 

McCarthy,  John  "Circumscription  -  a  Form  of  Non-Monotonic  Reasoning,”  Artificial 
Intelligence  13,  1980,  27-39 


6 


t 

/ 


McCarthy,  John,  and  Hayes,  Pat:  "Some  Philosophical  Problems  from  the  Standpoint 
of  Artificial  Intelligence,"  Machine  Intelligence  4, 1969, 463-502,  reprinted  in 
Weber  and  Nilsson  (eds)  Readings  in  Artificial  Intelligence,  Tioga  Publishing, 
Palo  Alto,  1981. 

McCarthy,  John:  "Applications  of  Circumscription  to  Formalizing  Common-Sense 
Knowledge,"  Artificial  Intelligence  28,  (1986)  89-116. 

McDermott,  D.,  and  Doyle,  J.  (1980):  "Non-Monotonic  Logic  I,"  Artificial  Intelligence 
13,  41-72. 

Pearl,  Judea:  Probabilistic  Reasoning  in  Intelligent  Systems:  Networks  of  Plausible 
Inference,  Morgan  Kaufmann,  San  Mateo,  1988. 

Poole,  David  L.:  "On  the  Comparison  of  Theories:  Preferring  the  Most  Specific 
Explanation,"  IJCAI  85,  Morgan  Kaufmann,  Los  Altos,  1985,  144-147. 

Reiter,  R.  (1980):  "A  Logic  for  Default  Reasoning,"  Artificial  Intelligence  13,  81-132. 

Shafer,  Glenn:  A  Mathematical  Theory  of  Evidence,  Princeton  University  Press, 
Princeton,  1976 

Shafer,  Glenn:  "Probability  Judgment  in  Artificial  Intelligence  and  Expert  Systems," 
Statistical  Science  2,  1987, 3-44. 

Shortliffe,  E.H.,  Computer-Based  Medical  Consultations:  Mycin,  American  Elsevier, 
New  York,  1976. 

Skyrms,  Brian:  "Higher  Order  Degrees  of  Belief,"  in  Hugh  Mellor  (ed)  Prospects  for 
Pragmatism,  Cambridge  University  Press,  Cambridge,  1980,  pp  109-138. 

Zadeh,  Lotfi  A.:  "Fuzzy  Logic  and  Approximate  Reasoning,"  Synthese  30,  1975,  407- 
428. 


7 


